electronic.alchemy :: Quick Hacks

Topics
photos
biographics
letterpress
pike
hyperlinks, etc
projects
contact info

sourcehut: hg
sourcehut: git

GotPike?

Recently Changed

Recent Albums
Basement
rephotograph
Monotype molds
Baltimore 2015
Europe 2014

Interesting People
James
JZ
Deff
Bertrand

pike > Quick Hacks

Quick Hacks

Created by hww3. Last updated by hww3, 15 years ago. Version #4.

Clean up trashed SQLite blobs

Sometimes, especially when converting data to sqlite, you'll find that fields in a record with binary data are marked internally as text data. What this means is that Pike will try to treat the data as UTF-8 and convert it to a pike string. Because binary data isn't always valid UTF-8. The fix is to re-store the offending data, making sure it's marked as binary, rather than text.

The following snippet is an example of how to do this. In our example table, called object_Versions, the offending field is called "contents". Since some of the rows are fine (perhaps those records are storing real text), we only re-store the rows that fail. To find the failing rows, fetch each row individually.

You can use the SQLite "CAST" operator to force the data marked as text to be retrieved as binary (BLOB) data. Then, you can put it back.

array ov = s->query("select id from object_versions");
foreach(ov; ; mapping v)
{           
  mixed err = catch(s->query("select contents from object_versions where id=:id", ([":id":(int)v["object_versions.id"]])));            
  if(err) // ah, a row with a problem. let's fix it...
  {                                                                                                                             
    werror("failed to fetch id %d&#110;", (int)v["object_versions.id"]);                                                                  
    mixed t = s->query("select CAST(contents as blob) as v from object_versions where id=:id", ([":id":(int)v["object_versions.id"]]));
    s->query("update object_versions set contents=:contents where id=:id", ([":id":(int)v["object_versions.id" , ":contents": t[0]["object_versions.v"]]));
  }
}

Carrot2 Document Clustering

Interface with http://www.carrot2.org's Document Clustering Server via XMLRPC

object x = Protocols.XMLRPC.Client("http://localhost:8081/xmlrpc/processor");
// input data is an array whose length is a multiple of 4.
// each document input has 4 fields, so document n can be found at
// inputdata[(4*n) .. (4*n) + 3]
//
// all 4 fields are required and are: 
// [0]id, [1]url, [2]title, [3]excerpt
//
// the sample data contains 1 input document.
array inputdata = ({"id0", "http://www.google.com", "google", "the google search engine"});

array clusters = x["cluster.doCluster"]("test query", 
      (["dcs.clusters.only":0]), ([]), inputdata)[0];

  foreach(clusters;; mapping cl)
    write("%s (%d)&#110;", cl->label, sizeof(cl->documents));

Not categorized | RSS Feed | BackLinks